Fault Management in Distributed Systems
نویسنده
چکیده
In the past decade, distributed systems have rapidly evolved, from simple client/server applications in local area networks, to Internet-scale peer-to-peer networks and large-scale cloud platforms deployed on tens of thousands of nodes across multiple administrative domains and geographical areas. Despite of the growing popularity and interests, designing and implementing distributed systems remains challenging, due to their everincreasing scales and the complexity and unpredictability of the system executions. Fault management strengthens the robustness and security of distributed systems, by detecting malfunctions or violations of desired properties, diagnosing the root causes and maintaining verif iable evidences to demonstrate the diagnosis results. While its importance is well recognized, fault management in distributed systems, on the other hand, is notoriously difficult. To address the problem, various mechanisms and systems have been proposed in the past few years. In this report, we present a survey of these mechanisms and systems, and taxonomize them according to the techniques adopted and their application domains. Based on four representative systems (Pip, Friday, PeerReview and TrInc), we discuss various aspects of fault management, including fault detection, fault diagnosis and evidence generation. Their strength, limitation and application domains are evaluated and compared in detail. Comments University of Pennsylvania Department of Computer and Information Science Technical Report No. MSCIS-10-03. This technical report is available at ScholarlyCommons: http://repository.upenn.edu/cis_reports/914 Fault Management in Distributed
منابع مشابه
Influence of Fault Current Limiter in Voltage Drop and TRV Considering Wind Farm
Influence of distributed generation systems in the distribution systems can increase the level of short-circuit current. The effectiveness of distributed generation systems is affected by the size, location, type of distributed generation systems technology, and the methods of connecting to distribution systems. Wind turbine system is the examples of distributed generation source. Not only does...
متن کاملA Genetic Based Resource Management Algorithm Considering Energy Efficiency in Cloud Computing Systems
Cloud computing is a result of the continuing progress made in the areas of hardware, technologies related to the Internet, distributed computing and automated management. The Increasing demand has led to an increase in services resulting in the establishment of large-scale computing and data centers, in addition to high operating costs and huge amounts of electrical power consumption. Insuffic...
متن کاملMulti-Agents Based Reference Model for Fault Management System in Industrial Processes
Nowadays, industrial necessities claims global management procedures integrating information systems in order to manage and to use the controlled-processes information and thus, to assure a good process behaviour. These aspects aim to the development of fault detection and diagnosis systems and makingdecision systems. In this work, a reference model for fault management in industrial processes ...
متن کاملThe Impact of Superconducting Fault Current Limiter Locations on Voltage Sag in Power Distribution System
In this paper, the impacts of installing superconducting fault current limiter (SFCL)in radial and loop power distribution system are evaluated to improve voltage sag in both cases of with and without distributed generations (DG). Among various SFCLs, the hybrid type with a superconducting element in parallel with a current limiting reactor (CLR) is selected. This is more effective than resisto...
متن کاملLazy fault tolerance-a method for dependable distributed systems
We present a new method called Lazy Fault Tolerance for reening the reliability of distributed systems. Lazy Fault Tolerance uses data redundancy and the data of objects are distributed over computers in accordance with their`nativity'. The data of system management objects, which control the whole system based on the information of each computer, are naturally distributed over all computers. T...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2014